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Abstract 

The key point limits to define the statistical model describing the data 
distribution. Hence, it turns out that the characteristics related to the so-called 
Inverse TuUy-Fisher relation and the Direct relation are maximum likelyhood 
(ml) estimators of different statistical models, and we obtain coherent distance 
estimates as long as the same model is used for the calibration of the TF 
relation and for the determination of distances. The choice of the model is 
motivated by reasons of robustness of statistics, which depends on selection 
effects in observation. 
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1. INTRODUCTION 



The method of correcting biases in estimating the distances of galaxies is one 
of the major problem which must be solved for a better understanding of the 
cosmic velocity fields, see |0, H, ^ and P.Teerikorpi (this conference). If one 
keeps in mind that any technique of fitting is intimately related to a statistical 
model then one understands that the cause of the weak convergence of 
present debates, for arguing on the use of either the direct Tully-Fisher relation 
(DTF) or the inverse relation (ITF), interprets as an unsufficiently handled 
formulation of the problem. The obstacle toward a consensus can be overcomed 
by arguing on the model instead of the technique of fitting. Most of the present 
contribution is a brief presentation of results obtained in 0. 

2. BASICS OF THE BIASES CORRECTION 

To ask oneself whether the statistical estimator (statistic) corresponds to the 
model parameter for which it has been made up, is indeed a sensible question. 
Generically, a statistic ^ of a given parameter 6 provides us with an estimate 

eN = e + eN (1) 

within a (unkown) random error e^, where N denotes the sample size. Thus, 
the accuracy of such an estimate can be discussed only in terms of character- 
istics describing the probability law of e^- For example, it is clear that the 
smaller the variance of the more precise such an estimate, as long as it 
is not biased. By definition, "6' at is biased when the expected value of is 
not zero". While an unbiased statistic shows a smaller variance, it turns out 
that such a property is not essential, it can be reached asymptotically (i.e., for 
N oo). 

Actually, the typical problem of biases in the present fields of interest is in- 
timately related to the question of whether the selection effects in observation 
are correctly taken into account in the statistical model. In other words, we 
easily understand that one can obtain unbiased statistics as long as the prob- 
ability density (pd) describing the eTv-distribution is known, which requires a 
"statistical modeling" of the data. At this point, which is the first step toward 
the understanding of any problem involving observations, nothing prevents 
us to use solely the maximum likelihood (ml) technique for obtaining suitable 
statistics. The enormous advantage of such an approach is to provide us unam- 
biguously with a unique fitting technique, which prevents us from subjective 
speculations on diagrams. 
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2.1 The Statistical Model - The Method 



The pd describing the distribution of observables reads 

dPohs = p ,N dPth, (2) 

where < < 1 is a selection function in observation, rfPth describes the 
distribution of intrinsic variables related to sources and Pth{<P) = S 'P dPth is 
the normalization factor. Obviously, working hypotheses are required in order 
to define the selection function (in term of observables) and the theoretical 
pd dPth (in term of intrinsic quantities). Hence, we can write the likelihood 
function^ £obs = -^th — In (Pth(0)), where £th corresponds to the pd dPt^, and 
the ML statistic is derived from the equation 

deCohs = 0. (3) 

Note the feature which informs on the presence of biases : a 6'-statistic related 
to equation deCth = differs from 9^ if d0Pth{4>) 7^ 0. 

If the sample is not peculiar then the ML statistic 6^ provides us with the 
most probable value of 6 within a given accuracy, althought it is not necessarely 
unbiased. For recovering an accurate estimate, the ML statistic must be shifted 
by the expected value of e^r, 

~ ^A^ - Pobs (eTv) , (4) 

while (in practice) such an approach might demand cumbersome calculations. 
However, according to the Central limit theorem, if is large enough then 
one expects that the discrepancy is neglectable {ejy ~ 0), which means that 
the ML statistic is asymptotically unbiased. Finally, we easily understand that 
any result is warranted as long as the distribution of variables involved in the 
calculation is correctly described by such a model. 

The calculation of the mean absolute magnitude of galaxies from a magni- 
tude limited sample is a pedagogic example for comparing the ML approach 
to the Malmquist (1920) calculation [^]. The statistical model is based on 
- a Gaussian luminosity distribution function; - a uniform spatial distri- 
bution; - and a sharp cutoff at a limiting magnitude mnm. Thus dPth oc 
gdM; Mo,o'M)dM e^'^dfi, where /3 = 3 In 10/5, and the selection function 
(pm{fn) = G {jnyim — (a^ + where denotes the Heaveside distribution func- 
tion. Since the normalization factor P^^ {(t>m) oc exp af^ — Mo) depends on 
Mo, the standard statistics are expected to be biased. Indeed, if au is un- 
known then the ML equations provide us with the following system of unbiased 

t Actually, it is more convenient to use its natural logarithm. 
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statistics 

Mo = {M)+Pal,, (5) 
< = ^(v/l + 4/32((M-il/W-l), (6) 

which can be solved by Newton's method. Note that the ML approach gener- 
ahzes the Malmquist (1920) solution. 



3. ABOUT THE DISTANCE ESTIMATE OF GALAXIES 

The goal is to estimate a distance modulus from the observed apparent mag- 
nitude m = M + n and the distance estimator p, which gives a rough estimate 
of the absolute magnitude M ^ a.p + 6 by means of the Tully-Fisher relation 
(for spirals) [1^, or the Faber- Jackson relation (for ellipticals) 0. The dis- 
tribution of intrinsic quantities is described by dPth = K{fi)dfi F{p, M)dpdM, 
where K{fi) accounts for the galaxies distribution in space and F{p, M) for the 
distribution in the p-M plane|^ For reasons that become clear in the following, 
we describe the p-M distribution according to different statistical models 

F(p,M)dpdM = ^G(C;0,a,)rfCx| pTF) ' 

where C, = a.p + b — M accounts for the intrinsic dispersion about the TF- 
relation, it is assumed to be Gaussian distributed about zero and with standard 
deviation a^. 

Table gives the related ML statistics of parameters a, b and cr^ in term 
of statistics of the covariance (Gov), the standard deviation (S), the mean 
((.)) and the correlation coefficient (p). It is then clear that the identifications 
of a to the "slope" and b to the "zero-point" of the TF relation are model 
dependent. These statistics are valid as long as the working hypotheses (Con- 
straints) are fulfilled, in particular the absence of p-selection effects. They 
must be corrected for a bias due to measurement errors, which also increase 
the dispersion. However, for typical samples, we obtain estimates with a rela- 
tive (1 a) accuracy of 7% for a and 15% for b. The simulations show that the 
main source of error is actually due to the small size of the calibration sample 
(~ 30 galaxies) instead of errors. 

Hence, we understand that the choice of the model must be discussed as 
a strategy. Indeed, the ITF model is much less constraintfuU than the DTF, 
which makes the related statistics more robust (see e.g. 0). In the other hand. 



must be noted that this distribution is different from the one in the TF-diagram, 
which is described by a prf oc F{p, M)dpdM (f>K,{^)diJL. 
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Table 1: Calibration Statistics 





ITF 


DTF 


a 
b 

Constraints 


S(M)7Cov(p, M) 
(M) - a{p) 
\p{p,M)\-'E{M)^l-p\p,M) 

0P = 1 


Cov(p,M)/S(p)' 
(M) - a(p) + pal 
S(M)y^l-p2(p,M) 

0mM = 6'(miim -m) 
/€(yu) oc exp(/5/i) 
. fp{p) = 9Gip]Po,crp) 



one might expect that (in general) the more numerous the working hypotheses 
the more precise the related statistic, the simulations show that the accuracy 
increases of 5% in the DTF model. However, it is clear that if one of these 
hypotheses is not so correct then the estimate is bogus. In practice, such 
a characteristic forces us to prefer the ITF approach, because of the usual 
conditions in observation. Nevertheless, it turns out that both models show 
the same robustness if they are improved for taking into account p-selection 
effects (in prep.). 

In order to estimate a likely distance modulus of a galaxy from the 
same statistical model we have to assume that the galaxy belongs to the same 
population of the calibration sample. According to the Bayesian schema^, 
provided the observables m = and p = Pk, the distribution of possible 
outcomes reads dPohsifJ' \ ^k,Pk) oc Jm Ip^{^~^k)S{p — Pk)dPohs, which gives 




p- Mo, cxm) (ITF) 
(DTF) 



where pk = mk — {a.pk + h) is model dependent, the mean p'^^'> and the standard 
deviation a^^^ depend on working hypotheses which specify the functions k 
and /a/. The value p'^^'' interprets as an unbiased estimate of the distance 
modulus. The difference between p^^"* and pk is not a bias of Malmquist type 
but a volume correction, since the Dirac's distribution functions cancel the 
dependence of any selection function on m and on p. Finally, it is important 
to mention that if the distribution function is not symetric about p^^'^ then 
this unbiased distance estimate does not necessarely correspond to the most 
probable distance 

^It is prefered to the frequentist schema |Q because the sample has a unique element, /j, 
interprets as a model parameter of the pd dPo\^s{^k,Pk \ fJ-)- 
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which is defined as the root of equation d^f^{n; ajj'^) = 0. Therefore, we 
see that the problem of the distance estimate of individual galaxies depends on 
the choise of the "strategy of gambling" (i.e., either one minimizes the random 
error or one bets to the most likely value within a given accuracy). According 
to Eq. (|), it is important to note that the DTF statistic does not require 
information on the luminosity distribution function, which makes the related 
distance estimate more robust than the ITF one. Therefore, we understand 
that if p-selection effects are absent then it is more convenient to use the ITF 
model for the calibration step, while the DTF model is prefered for the distance 
estimate. The possibility to get benefit of both advantages is presented by S. 
Rauzy (this conference). 

If /m = dG and fi;(/i) oc e^^ then the distance estimates coincide. 




where 7 = a^^/aM is a tiny quantity. The formal comparison of statistics 
shows that the discrepancy is a random variable of zero mean and neglectable 
standard deviation. Moreover, if the estimation of the mean Mo limits to 
the calibration sample then both models provide us with the same distance 
estimat^. 
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